Bipartite spectral graph partitioning for clustering dialect varieties and detecting their linguistic features
نویسندگان
چکیده
In this study we use bipartite spectral graph partitioning to simultaneously cluster varieties and identify their most distinctive linguistic features in Dutch dialect data. While clustering geographical varieties with respect to their features, e.g. pronunciation, is not new, the simultaneous identification of the features which give rise to the geographical clustering presents novel opportunities in dialectometry. Earlier methods aggregated sound differences and clustered on the basis of aggregate differences. The determination of the significant features which co-vary with cluster membership was carried out on a post hoc basis. Bipartite spectral graph clustering simultaneously seeks groups of individual features which are strongly associated, even while seeking groups of sites which share subsets of these same features. We show that the application of this method results in clear and sensible geographical groupings and discuss and analyze the importance of the concomitant features.
منابع مشابه
Hierarchical bipartite spectral graph partitioning to cluster dialect varieties and determine their most important linguistic features
In this study we apply a hierarchical bipartite spectral graph partitioning method to a Dutch dialect dataset to cluster dialect varieties and determine the concomitant sound correspondences. An important advantage of this clustering method over other dialectometric methods is that the linguistic basis is simultaneously determined, bridging the gap between traditional and quantitative dialectol...
متن کاملHierarchical Spectral Partitioning of Bipartite Graphs to Cluster Dialects and Identify Distinguishing Features
In this study we apply hierarchical spectral partitioning of bipartite graphs to a Dutch dialect dataset to cluster dialect varieties and determine the concomitant sound correspondences. An important advantage of this clustering method over other dialectometric methods is that the linguistic basis is simultaneously determined, bridging the gap between traditional and quantitative dialectology. ...
متن کاملBipartite spectral graph partitioning to co-cluster varieties and sound correspondences in dialectology
In this study we used bipartite spectral graph partitioning to simultaneously cluster varieties and sound correspondences in Dutch dialect data. While clustering geographical varieties with respect to their pronunciation is not new, the simultaneous identification of the sound correspondences giving rise to the geographical clustering presents a novel opportunity in dialectometry. Earlier metho...
متن کاملPatterns of language variation and underlying linguistic features: a new dialectometric approach
For almost forty years quantitative methods have been applied to the analysis of dialect variation: these methods focused mostly on identifying the most important dialectal groups using an aggregate analysis of the linguistic data (Séguy 1973; Goebl 1984; Nerbonne et al. 1999). While viewing dialect differences at an aggregate level certainly gives a more comprehensive view than the analysis of...
متن کاملAnalyzing phonetic variation in the traditional English dialects: Simultaneously clustering dialects and phonetic features
This study explores the linguistic application of bipartite spectral graph partitioning, a graphtheoretic technique that simultaneously identifies clusters of similar localities as well as clusters of features characteristic of those localities. We compare the results using this approach to previously published results on the same dataset using cluster and principal component analysis (Shacklet...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Speech & Language
دوره 25 شماره
صفحات -
تاریخ انتشار 2011